DOCUMENT EESUME 



ED 045 709 

AUTHCE 

TITLE 

INSTITUTION 
SPONS AGENCY 
PUB DATE 
NOTE 



EDES PEICE 
DESCEIPTOSS 



TM 000 278 

Murray, James E.; Wiley, David E. 

New Statistical Techniques for Evaluating 
longitudinal Models. 

Chicago Univ., 111. 

Early Education Eesearch Center, Chicago, 111. 

Sep 70 

1 3 p . ; Paper presented as part of the Symposium 
"Models and Methods for the Study of the Life 
Cycle'*, given at the American Psychological 
Association Convention , Miami Beach, Florida, 
September 1970 

EDES Erice MF-$0.25 HC-S0.75 

Behavioral Sciences, *Data Analysis, ^Evaluation 
Techniques, Feedback, Goodness of Fit, ^Longitudinal 
Studies, ^Mathematical Models, Probability, Program 
Evaluation, #Eesearch Methodology, ^Statistical Data 



ABSTEACT 

A basic methodological approach in developmental 
studies is the collection of longitudinal data. Behavioral data cen 
take at least two forms, qualitative (or discrete) and quantitative. 
Both types are fallible. Measurement errors can occur in quantitative 
data and measures of these are based cn error variance. Qualitative 
or discrete data <' in contain misclassif ication errors, and these are 
expressed as probabilities of misclassification. Statistical models 
for psychological data must take these differences into account. A 
simple sequence is presented as an example of a qualitative model, 
while disengagement is the model given as an example for quantitative 
data. These examples, which are special cases of more general 
problems, lead tc an outline of the general nature of the qualitative 
and quantitative models. The primary concern here is to develcp 
statistical models which permit the investigation of structure in 
fallible lcngitudinal data. Statistical descriptions of the simple 
sequence and disengagement models are included in the appendix. (CK) 
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Part I : General Description 



A basic methodological approach in developmental studies is the 
collection of longitudinal data, i.e. observations on the same Ss at 
multiple points in time. Two often asked questions of such data are: 

1) Are there invariant sequences of behavioral phenomenon? 

2) What are the processes which cause variables to change 
over time? Or, what controls age related changes in 
variables? 

There are obviously a number of additional questions which can be 
asked of longitudinal data. However, these particular problems are quite 
general. These questions can be found in studies of childhood as well as 
in studies of old age. Furthermore, such questions can provide the 
statistician working with developmental data with a useful starting point. 
We have approached these questions as problems in statistical model 
building. 

Behavioral data can be of (at least) two forms, the first, qualitative 
or discrete, and the second, quantitative. Qualitative or discrete data is 
generally characterized by observations which are categorized into one of 
a number of mutually exclusive classes. Variables such as occupation, 
cognitive stage, or marital status are discrete or qualitative. Quantita- 
tive data, on the other hand, arise from observations which yield measures 
on a ratio scale. Examples of variables which are quantitative are weight, 
height and true-scale scores. 

Each of these two kinds of data is fallible, however. Quantitative 
data can have measurement error in it, and classical test reliability 

(J. Ward Keesling, Department of Education, UCLA, has been responsible for 
much of the work on quanitative models.) 
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theory has been developed to meet this statistical problem. The measure- 
ment error in quantitative data is itself quantitative. Because of this, 
a measure of the noise or error in a quantitative variable is based on 
the error variance . Qualitative data is also fallible, and this is referred 
to as misclassification error. Piaget's framework can provide an example 
of a misclassification error. There can be a nonzero probability that a 
child could really bo at the preoperational stage of cognitive development 
but be observed or classified as being at the level of concrete operations. 
Misclassificatior error for discrete data is expressed as probabilities of 
misclassification, as opposed to error variance v/hich characterizes 
quantitative data. 

Statistical models for psychological data should take into account the 
difference between quantitative and qualitative or discrete data. Further- 
more, each kind of model includes measurement error parameters appropriate 
to the data form. The inclusion of such error parameters is significant 
because it renders inaccurate some statistical estimation procedures. For 
example, the simple least square procedure used in regression analysis will 
not yield correct regression weights when quantitative independent variables 
are measured with error. Models have been developed for longitudinal data 
v/hich contains measurement error. 

A Simple Sequence: An example of a qualitative model . Cogn i ti ve 
development in the child has been seen, especially by Piaget, as essentially 
a sequential process of passing through various qualitatively distinct stages 
of cognitive organization. The verification of such an assertion requires 
longitudinal data. Such data must meet the constraint that each child pass 
through or possess cognitive stages in the proper order. An observed 
sequence of stages for a sample of children, however, will probably show 



some children who do not change in the predicted way. These observed 
patterns which are theoretically inadmissible could be due simply to 
mi sclassifi cation error, lie have developed a model for this problem 
which includes misclassification error for each stage or qualitative 
category. This model will allow the observed probabilities to be 
'incorrect' yet have the underlying or latent process follow a strict 
sequence at the same time. To obtain estimates of both the misclassi- 
fication errors on the one hand, and the underlying transition rates 
between stages on the other hand, the method of maximum likelihood is 
used. Furthermore, the overall goodness of fit of the sequence-plus- 
mi sclassifi cation error model is tested. 

Disengagement: An example for quantitative data . The problem of 
disentangling antecedent-consequent relationships among variables which 
cannot be experimentally controlled is one which students of the life- 
cycle regularly face. The 'disengagement hypothesis' of Cumming & Henry 
is centrally concerned with such a problem. The question behind this 
particular hypothesis concerns the relationship between the psychological 
and social involvement of individuals as they enter the period of retire- 
ment and old age. The 'bare-bones' of this hypothesis is that aging 
naturally entails a withdrawal from society which is preceded or 
anticipated, on the individual level, by an increasing psychological 
focus on the self. Verification of such a hypothesis can be accomplished 
by collection of longitudinal data regarding the degree of social inter- 
action as well as data regarding the degree of ego involvement in the world 
of people and objects outside of the self. Or.t« such data are available, 
problems of analysis come to the foreground. There are two major issues 
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1) The data will have measurement error. That is to say, 
the measures of social involvement and psychological 
involvement villi each be subject to error. In this case, 
classical linear regression does not yield correct 
estimates of the true or latent relationship between the 
variables. 

2) The underlying relationship between psychological involvement 
and social involvement must be directly expressed in a 
structural model. The basic antecedent-consequent relation 
can be expressed by letting the level of social involvement 
at a given time be linearly dependent on the level of 
psychological involvement at the immediately preceding time . 
This linear dependence over time is postulated to hold on 
the latent or true part of the variables. That is, measured 
social involvement is not a simple function of measured 
psychological involvement since each measure has error . 

He have assumed that these two variables can be quanti tatively 
measured. The model which includes structural equations and measurement 
error can be estimated by using the method of maximum likelihood. A 
test of goodness of fit is given by a likelihood ratio. 

General Comments : The example of a qualitative model, the simple 
sequence, and the example of a quantitative model, the disengagement 
hypothesis, are only special cases of more general problems. Our work is 
primarily concerned with developing statistical models which enable one 
to investigate structure in fallible longitudinal data. Since many 




substantive problems require different forms of data, for example, bone 
growth vs. ego development, we have tried to extend structural models to 
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qualitative data as well as quantitative data. The general nature of 
the models we are considering can be described as follov/s. 

The general nualitative model is conceptually related to 
Lazarsfeld's latent structure analysis. There are two major differences: 

1) Items, the qualitative variables, have as many misclassi- 
fi cation parameters (probabilities) as they have categories 
of response in our models. For example, dichotomous items, 
e.g. yes/no, have two parameters —one for each corresponding 
latent state. In Lazarsfeld's system, on the other hand, 
items often are given only one mi sclassifi cation parameter. 

2) The latent classes of Lazarsfeld, which each have a latent 
probability, are highly restricted in r ur own models. A 
given latent class probability in Lazarsfeld's model is 
expressed as a function of various latent probability para- 
meters in our models. Our approach to parameterization is 
necessary in testing hypotheses which involve structured 
processes among latent classes. 

The general quantitative model is regarded as a covariance 
structure model. The quantitative measures are assumed to be dis- 
tributed as multivariate normal vectors and our application of 
maximum likelihood solutions corresponds to what the econometricians 
call a full -information method for solving structural regression 
problems. The quantitative model allows longitudinal data to be 
structured a number of different ways. For example: 

1) Feedback - 2 or more variables which each determine 




one another over time can be examined ; 
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2) Multiple cause-effect - the nature of simple 
interdependence of many variables over time can be 
studied; and 

3) Systems of processes - the nature of complex chains 
of dependencies among *ariables can be studied 
with longitudinal data. 

In conclusion, some comment on the general substantive relevance 
of these kinds of statistical models should be made. Many developmental 
psychologists claim that few, if any, tenable theories have been available 
in this general field. Thus, most data analysis is really a hunting 
expedition rather than a process of rigorous confirmatory study. By the 
looks of things, this statement is on the whole descriptively accurate. 

The role of our statistical models is not that of the divining rod for 
scientific discoveries. It is our belief, however, that successful 
scientific hunting is found to be so in confirmatory analysis. Our 
models are built to do confirmatory analysis in a way not previously 
available. 




/ 



9 



Part II: 



Statistical Description 



The Qualitative Model : 
qualitative data is given by 



The general form of the models for 

( 1 ). 



£ = Qn , where 

p is an m x 1 vector of observed or 
mani fes t probabi 1 i ti es . 

Q is a m x m matrix of misclassification 
probabilities. 

n is an m x 1 vector of latent class 
probabilities. 

Q -has the following characteristics: 

A) For each of the k separate items used at a given time, 
there is a separate matrix of misclassification 
probabilities, say Q. , i = l,...,k. 

B) Each Q. is a mtrix of conditional probabilities of the form 



(2) q i jl * = P( r j|p t )» 

th 

where r. is the j ' manifest or observed response to item 

J 

i » j “ 1 • jJ* 

th 

p is the i true or latent category for item i, 

J l = I L. , (L. = and 

(3) - I- 

C) Conditional independence of the response errors is assumed so 
that, at the t 111 time of measurement; 



(4) 



Q t - Qi 0 ® 



D) Independence of the response error probabilities with 
respect to time of measurement is also assumed, so that. 



(5) 



Q = Q t ®Q t (E) — Q t » 



E) Finally, since both conditional independence of response 
errors over items and independence of time are assumed, there 
are only L total independent parameters in Q, where; 



The construction of £ is totally dependent on the particular problem 
being studied. The manner in which the latent class probabilities are 
expressed is a result of the parameterization chosen, which itself is 
intended to reflect the structural process and hypothesis being studied, 

The Simple Sequence Example : Let us assume that there are four 
possible stages or categories of cognitive functioning. Each child can 
be placed in only one stage at any particular time of measurement. 

Assume further that children are measured at 3 equally spaced points in 
time. Lastly, the hypothesis of interest is that there is a strict 
sequence of stages which each child must follow, e.g. Stage WWIWV. 
One parameterization which we have chosen involves the following 
parameters : 

A) 3 latent initial state parameters (c^ , a 2 > “ 3 ) which are 



( 6 ) 



L = 




the probabilities of the first 3 latent stages. Here the 

11. 

probability of the latent stage at time 1 is denoted by 






t1 i«»7*pvvy# a% ^ v . 
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B) The remaining parameters are 3 transition probabilities 
which express the probability of movement among the 
latent stages: i.e. P(II|I)=P r P(III | II) = 

P 2 and P(IVf III) = P 3 . 

The transition probability matrix is: 



(7) 

T = Stage l 



Time 2 X 

II 

III 

IV 



0 

0 



Time 1 

II III 



0 0 

1-P 2 0 




IV 

0 

0 

0 

1 



The zeroes in tins matrix, T, serve to express the structural hypothesis 

A) that there is no "regression," and 

B) that there is no "stage- jumping." 

3 

For this model, since there are 4 response patterns possible. 



(3) 



£ is the 64 x 1 vector, of 
the manifest probabilities 
Q is a 64 x 64 matrix of the form 



(9) Q ■ Q 1 (x),Q 1 C>DQ 1 » 

Qj being the 4 x 4 matrix of mijclassification 
probabilities which is constant over time. 




( 10 ) 



n is the 64 x 1 vector of latent class probabilities, 
where each latent class is one of the 64 possible patterns 
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of stage membership. That is, isolating tho (j, fc) 
element of T by (t j £ ) the latent response pattern 
triple < k, j, & > has the probability 

(in * k > j» * ”) * vw • 

The total number of parameters in the simple sequence model for four 
stages is 18; 12 for Q and 6 for n . 

The estimates of these parameters are found by maximizing the likelihood 
function defined for the qualitative model by means of the multinomial 
distribution. 

The Quantitative Model: The general form of the models for 
quantitative data is given by (12). 

- y * n + £ 

Tho structure of the model is on n_, the true or latent variables: 
a* Ajg + _0, where e. is a vector of random variables, 
n = (I - Afl . 

A, ^ and e_ can be treated as partitioned by time period, each 
period having multiple variables at each time. 

A) 8j*wi(uj , ) • J “ 1,2, ..., n; 

B) ru, e. are m x 1 vectors, i.e. there are m variates 
J J 

at each time. 

C) Cov (©j, 0 j' ) ,= 0, for j M 1 

D) Ajj. is an m x m matrix. 



02 ) 

(13) 

(14) 

(15) 
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The nature of the matrix A is determined by the structure or 
hypothesis being investigated. If A is restricted to be lower 
triangular, then there is no feedback and the model is referred to 
as a Lag Model . This type of model is most often used for longitudinal 
data where K, the degree of the Lag, is defined by 



(16) 



•Qt+i = jI(t-k) Aj ij + -t+r 
where j denotes the j t!l time point. The general 

structure of V , the covariance of the observed variables 

L y 

is given by (17) 



O 
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(17) 



L = (I - A)- 1 * (I - AT 1 , 



v»here is a diagonal matrix of the error (e) variances, 

and $ is the covariance matrix of the, random variables 

V f 

The Disengagement Example : The measures of social involvement 
and psychological involvement will be denoted by y„; and y-j 
respectively, and assume there are three equally spaced observations 

t * 

over time on both variables. The structural hypothesis is that the 
latent or true amount of social involvement at time ( t + 1) is a 
linear function of the true level of psychological involvement at 
time (t). 

08 ) 



y li = n li + e li ’ 



where i=l,2,3, and denotes time 



n l(i+l) * a l n li + e (i+i)» a la 9 of degree 1. 
*k" + li» 



A 1 = 



1 

°1 

a, a 



0 

1 



l a £ “2 



a 



t' 






:i2- 



(19) 



( 20 ) 



( 21 ) 



y 2 i = n 2 i + e 2 i 



n 2(i+l ) 

h 

A 12 



A i n 1 i + B i n 2 i + e (i+l) 
^ 1 2—1 + A 2— 2 + -2 



B l 6 2 



0 

0 



0 

0 



A 1 A 2 X 2 



0 

1 



j JL] ’ » 2. = ; H] 'i 

l\ hi 






0. = 6 1 



62 



( 22 ) 



A = 



A 



1 

12 



0 

A, 



l = (I - A )' 1 0 + e 

ly = (I - A )- 1 * (I - A 1 )’ 1 + T 



where $ + y are diagonal. 

There are a total of [(6°'/)/2] = 21 statistics in Y . There are 

J 

6 parameters in A, (c^, o 2> x 2> 0 ^ e 2 ), 6 parameters in 

t (° 0 ll >• • • »°e23^ * and 6 parameters in f , (°eir ,,,,0 e23^ * 




